2 research outputs found

    A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures

    Full text link
    Irregular computations on unstructured data are an important class of problems for parallel programming. Graph coloring is often an important preprocessing step, e.g. as a way to perform dependency analysis for safe parallel execution. The total run time of a coloring algorithm adds to the overall parallel overhead of the application whereas the number of colors used determines the amount of exposed parallelism. A fast and scalable coloring algorithm using as few colors as possible is vital for the overall parallel performance and scalability of many irregular applications that depend upon runtime dependency analysis. Catalyurek et al. have proposed a graph coloring algorithm which relies on speculative, local assignment of colors. In this paper we present an improved version which runs even more optimistically with less thread synchronization and reduced number of conflicts compared to Catalyurek et al.'s algorithm. We show that the new technique scales better on multi-core and many-core systems and performs up to 1.5x faster than its predecessor on graphs with high-degree vertices, while keeping the number of colors at the same near-optimal levels.Comment: To appear in the proceedings of Euro Par 201

    On analysis of partitioning models and metrics in parallel sparse matrix-vector multiplication

    Get PDF
    Graph/hypergraph partitioning models and methods have been successfully used to minimize the communication requirements among processors in several parallel computing applications. Parallel sparse matrix-vector multiplication~(SpMxV) is one of the representative applications that renders these models and methods indispensable in many scientific computing contexts. We investigate the interplay of several partitioning metrics and execution times of SpMxV implementations in three libraries: Trilinos, PETSc, and an in-house one. We design and carry out experiments with up to 512 processors and investigate the results with regression analysis. Our experiments show that the partitioning metrics, although not an exact measure of communication cost, influence the performance greatly in a distributed memory setting. The regression analyses demonstrate which metric is the most influential for the execution time of the three libraries used
    corecore